Development, evaluation and comparison of machine learning algorithms for predicting in-hospital patient charges for congestive heart failure exacerbations, chronic obstructive pulmonary disease exacerbations and diabetic ketoacidosis

Background Hospitalizations for exacerbations of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD) and diabetic ketoacidosis (DKA) are costly in the United States. The purpose of this study was to predict in-hospital charges for each condition using machine learning (ML) models. Results We conducted a retrospective cohort study on national discharge records of hospitalized adult patients from January 1st, 2016, to December 31st, 2019. We constructed six ML models (linear regression, ridge regression, support vector machine, random forest, gradient boosting and extreme gradient boosting) to predict total in-hospital cost for admission for each condition. Our models had good predictive performance, with testing R-squared values of 0.701-0.750 (mean of 0.713) for CHF; 0.694-0.724 (mean 0.709) for COPD; and 0.615-0.729 (mean 0.694) for DKA. We identified important key features driving costs, including patient age, length of stay, number of procedures, and elective/nonelective admission. Conclusions ML methods may be used to accurately predict costs and identify drivers of high cost for COPD exacerbations, CHF exacerbations and DKA. Overall, our findings may inform future studies that seek to decrease the underlying high patient costs for these conditions. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-024-00387-9.

to the Center for Medicare & Medicaid Services [1].Healthcare costs are disproportionately concentrated among a small group of high-cost patients [2][3][4].High-cost patients often have significant unmet critical healthcare needs despite the substantial healthcare costs they incur [5,6].
Congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD) and diabetes mellitus are life-altering, high-cost, high-volume conditions that affect millions of people and result in many hospitalizations per year [7].According to Medical Expenditure Panel Survey data for 2017 to 2018 published by the American Heart Association (AHA), diabetes mellitus, heart disease, CHF and respiratory conditions, including COPD, were among the top 10 leading diagnoses for direct health expenditures [8].
CHF is one of the leading causes of hospitalization in the U.S., affecting 6 million adults as of 2018 and costing the nation an estimated $30.7 billion in 2012 according to the American Heart Association, with these costs deriving largely from exacerbations requiring emergency visits and hospitalizations [8][9][10].Similarly, COPD is a high-cost disease-as COPD progresses, patients often experience acute exacerbations, characterized by dyspnea, cough, sputum production and worsening lung function; COPD exacerbations cause frequent hospital admissions and readmissions, reportedly accounting for 90.3% of the total medical cost related to COPD and leading to US $32.1 billion in total medical cost [11,12].Finally, diabetic ketoacidosis (DKA) is one of the acute, lifethreatening complications of diabetes mellitus, a disease affecting 37.3 million people as of 2019 according to the CDC [13].DKA is a common cause of hospitalization in patients with diabetes and is characterized by uncontrolled hyperglycemia, metabolic acidosis, and increased serum ketone concentrations [14,15].

Prior machine learning methods studying our outcomes: CHF, COPD, DKA
Machine learning (ML) techniques have emerged as a mechanism for analyzing highdimensional medical data to understand the factors underlying patient-, hospital-and health system-level outcomes [16].Specifically, for our three cohorts of patients, ML techniques have been utilized to identify at-risk patients, predict the risk of readmission and readmission rates, and predict the length of inpatient stay [11,12,[17][18][19][20][21].Work has been done to develop predictive models to identify major underlying drivers of high healthcare costs for patients in generalized cohorts as well as several other cohorts of patients, such as breast cancer patients and coronary artery bypass graft patients [22][23][24][25][26].To date, however, robust machine learning algorithms for predicting in-hospital expenditures and the factors that influence them have not been evaluated in patients experiencing CHF exacerbations, COPD exacerbations or DKA.

Methods
The purpose of our study was to build and evaluate ML models to predict in-hospital charges associated with hospitalizations for these conditions, as this has not been done previously.Furthermore, based on the model output, we provide recommendations for model optimality in modeling in-hospital expenditures in each cohort and identify factors that underlie high-cost in-hospital admissions for each of the three diseases.
An overview of the methodology employed is shown in Fig. 1.All data processing and statistical and machine learning analyses were conducted on a MacBook Air (2022) Fig. 1 Overview of Study Methodolog: The HCUP-NIS 2016 Core, Severity Measures, Hospital Weights, and Cost Charge files were merged, and data related to hospital discharge and demographics were extracted as continuous, categorical and binary variables.ICD-10 comorbidity mappings from AHRQ were determined from ICD-10 codes.R codes were written to extract, clean and analyze the HCUP-NIS data.Six ML models were then trained, evaluated, and validated for each of the three disease cohorts, and the best model for each disease cohort was determined equipped with an Apple M2 chip, 8 GB of unified memory, running macOS Sonoma (version 14.4.1).To optimize computational efficiency, we implemented parallel processing in R (version "Kick Things", released August 8, 2021) using the RStudio (version 1.4.1717)integrated development environment.We implemented models with tidymodels, ranger, xgboost, glmnet and kernlab packages of R.

Dataset and Study Design
The National (Nationwide) Inpatient Sample (NIS) is a large, publicly available all-payer inpatient care database in the United States that contains data on more than seven million hospital discharges each year and is maintained as part of the Healthcare Cost and Utilization Project (HCUP) [27][28][29].We used the HCUP-NIS Core, Severity, Hospital and Cost Charge datasets and queried the datasets for all hospitalizations between January 1, 2016, and December 31, 2019.Patients who were discharged from the hospital, patients aged < 18 years or who died were excluded.We identified patients who met the three disease conditions using the International Classification of Diseases version 10 (ICD-10) codes: 1) chronic obstructive pulmonary disease (COPD) exacerbation via the ICD-10 code J441; 2) congestive heart failure (CHF) exacerbation via the ICD-10 codes I5021, I5023, I5031, I5033, I5041, and I5043; and 3) diabetic ketoacidosis without coma (DKA) via the ICD-10 codes E1010, E1011, E1111, and E1110 [30].Supplemental Table 1 shows the extracted ICD-10 codes and principal diagnoses for each of these conditions.
We identified a total of 26,190 unique discharges across the three conditions, including 9,552 discharges for COPD, 14,688 for CHF and 1,950 discharges for DKA.The primary outcome for this study was total in-hospital charges.

Predictor Variables
We conducted a preliminary literature review to determine potential factors that may affect in-hospital charges and that could be used as predictors in our analysis.The initial predictors for analysis included 46 variables, including 29 unique ICD-10 diagnosis code groupings extracted from the HCUP-NIS dataset, which included demographic characteristics, hospital-related variables, health care utilization six months before index admission, and discharge-related variables.A brief description of each predictor variable is given in Supplemental Table 2. Further descriptions of the potential values of each variable can be found on the NIS Description of Data Elements page (https:// www.hcupus.ahrq.gov/ db/ nation/ nis/ nisdde.jsp).
The ICD10 diagnosis codes were transformed into Agency for Healthcare Research and Quality (AHRQ) comorbidity categories using the icd R package.If a patient had at least one ICD10 code in one of the AHRQ comorbidity categories, then they were considered positive for that category.A list of AHRQ comorbidity categories is shown in Supplemental Table 3.

Univariate Analysis of Predictor Variables
The relationships between each of the predictor variables and total charges were analyzed using two-sample t tests.Statistical significance was determined at the 95% confidence level, with p < 0.05 indicating statistical significance.We also calculated the correlations between each predictor variable in the dataset using the Pearson method.To reduce the quantity of variables without having to choose variables a priori, only variables with a Pearson correlation coefficient above 0.2 were visualized.

Model Specification
We investigated six ML algorithms: linear regression (LM), ridge regression (Ridge), support vector machine (SVM), random forest (RF), gradient boosting (GBM) and extreme gradient boosting (XGB).These are popular models used in machine learning for healthcare classification and prediction.First, we preprocessed the variables using common feature engineering steps as described in "Preprocessing and Feature Engineering of Predictor Variables" section.Then, we split the data for each condition into training and testing datasets, with 75% of the derivation sample for in-sample training and 25% for out-of-sample testing.Next, we performed hyperparameter tuning for our six algorithms using a randomized grid search and 5-fold cross-validation and determined the best hyperparameters as described in "Hyperparameter Tuning" section.The final model with tuned hyperparameters for each algorithm was then fit to the testing data using 5-fold cross-validation as described in "Model Finalization" section.We then evaluated the performance of each model as described in "Model Performance Evaluation and Comparison" section, then examined the final feature importance rankings as described in "Assessment of Feature Importance" section.

Preprocessing and Feature Engineering of Predictor Variables
Due to the asymmetric distribution of characteristics and predictor variables, cases with missing data for any of the dependent or independent variables were excluded from this analysis, a common, though controversial, approach for dealing with missing values [31].Then "one-hot encoding" was performed, transforming each categorical variable into a numerical dummy variable, a common preprocessing step to aid analyses with different ML models [32].Next, within the dataset for each condition, variables with zero variance and those with large absolute correlations with other variables were determined and excluded from the datasets [33].Finally, all continuous or numerical predictor variables were standardized such that their mean was 0 and standard deviation was 1 (Z-score standardization).This is a common preprocessing method used to decrease the likelihood of bias of the model due to very large or small numeric variables [34].After this preprocessing, the preprocessed datasets for each condition contained the 46 preprocessed predictor variables.

Hyperparameter Tuning
Where applicable, we performed a randomized grid search for hyperparameter tuning to optimize model performance, generalizability and robustness on unseen data [33].An overview of the considered hyperparameters is displayed in Supplemental Table 4. Hyperparameter ranges were chosen based on those used in prior work [33,35,36].Model performance for each hyperparameter permutation was assessed using fivefold cross-validation to determine the optimal settings that achieved the best balance between bias and variance.The top-performing model was determined as that hyperparameter permutation for each model that produced the best R-squared when fitted to the out-of-sample test dataset.

Model Finalization
The tuning hyperparameter combinations with the best mean R-squared values across fivefold cross validation was used in the final model for each algorithm for each condition.These final models were then fit to the training dataset, then used to predict total charges based on the testing dataset.

Model Performance Evaluation and Comparison
Performances of the final models were estimated by their R-squared and root-mean square error (RMSE), which are common metrics used to measure the accuracy of prediction models [37,38].R-squared is a measure of the goodness of fit of a model and has a maximum value of 1. Models with R-squared values closer to 1 are more well fitted to the data.RMSE measures the quality of predictions by determining how far predictions fall from measured true values using the Euclidean distance.It is a standard metric for measuring the error of a model, with smaller values indicating less random noise and thus higher accuracy.Model performance according to these two metrics on the in-sample training set and out-of-sample testing set were determined.Top-performing models were determined best on R-squared estimates.

Assessment of Feature Importance
Importance of the predictors in the final models were determined from their variable importance (VI) scores.VI scores demonstrate how much the prediction changes as the feature values vary, with higher feature importance indicates greater importance of the feature to the model prediction [39].For linear models, the relative importance is determined by the absolute value of the t-statistic.For gradient boosting models, the relative importance is determined from the absolute value of the coefficients corresponding to the tuned model.Based on this relative feature importance, we visualized the top twenty most influential features in VI plots (VIPs).

Sample characteristics
In total, 26,190 unique hospital discharge records with complete data were available for the analysis from January 1, 2016, to December 31, 2019-14,688 patients hospitalized for CHF exacerbation, 9,552 patients hospitalized for COPD exacerbation and 1,950 patients hospitalized for DKA without coma.The characteristics of the sample cohorts are summarized in Table 1.The average costs for hospitalizations were US$18,196 (± $29,248) for CHF exacerbations, US$13,572 (± $17,598) for COPD exacerbations and $13,650 (± $16,778) for DKA episodes.The mean length of stay and number of inpatient procedures were highest in the CHF cohort at 6.36 days and 1.90 procedures, respectively; the mean length of stay was 5.32 days in the COPD exacerbation cohort and 5.08 days in the DKA cohort, and the number of procedures was 1.32 for both COPD patients and DKA patients.As shown in Fig. 2, the mean cost charges for each condition steadily increased for each condition over the four-year period from 2016 to 2019.

Univariate analyses
Tables 2 and 3 show the univariable results for the categorical and continuous variables, respectively.A longer inpatient stay and greater number of procedures were associated with greater in-hospital total charges.Older patients also incurred higher total charges.For several features, such as sex, payment method, hospital bedsize, hospital control, hospital location, All Patients Refined Diagnosis Related Groups (APRDRG) severity score and APRDRG risk mortality score, the differences in total charges between groups of patients within each cohort were often statistically significant (for example, patients in large hospitals incurred greater charges than those in smaller hospitals in each disease cohort, p < 0.05).Notably, black patients incurred more charges than white patients did (p < 0.01).
The Pearson coefficients of the most correlated variables are visualized in Fig. 3.The data show that collinearity exists between several variables.For each of the three conditions, the number of procedures and APRDRG risk mortality were the most strongly positively correlated with the nondiagnosis variables (with correlation coefficients of 0.80 for CHF, 0.79 for COPD and 0.77 for DKA), while age and payment method were the most negatively correlated with the nondiagnosis variables (with correlation coefficients of -0.50 for CHF, -0.50 for COPD and -0.44 for DKA).
Urban Non-Teaching 2431 (     models for all conditions, indicating it was the most important predictor in each of the models.The number of procedures during hospitalization was consistently the second most important feature, with age and elective/nonelective admission also consistently being strong predictors across the models.This finding aligns with our univariable analyses (Tables 2 and 3).

Discussion
Although many studies have employed ML techniques to predict at-risk patients, readmission risks, readmission rates and length of stay for CHF, COPD and DKA patients, the development of a predictive model of in-hospital cost charges in these disease cohorts is a novel contribution of this study.We constructed 6 ML models that had good predictive performance.The R-squared values ranged from 0.659 to 0.681 with a mean of 0.668 during training and 0.701 to 0.750 with a mean of 0.713 during testing for the CHF dataset; from 0.680 to 0.710 with a mean of 0.687 during training dataset and 0.694 to 0.724 with a mean of 0.709 during testing for COPD; and from 0.605 to 0.651 with a mean of 0.628 for training and 0.615 to 0.729 with a mean 0.694 during testing for DKA.As such, on average, models similarly for all three conditions and, on average, models performed better on the unseen testing data than on the training data.
Unsurprisingly, length of stay was the most important predictor in each of the models, disproportionately affecting hospital charges in each model.This was followed by the number of procedures performed during hospitalization.Age and elective/nonelective admission were also important predictors in at least one model for each disease condition.Feature selection indicates that although these variables are extremely influential in any model, many other patient-level and hospital-level features also have small but measurable impacts on hospital charges.

Strengths of our study
The strengths of our study include the large sample size of the HCUP NIS datasets.Furthermore, the availability of many demographic characteristics, diagnosis-related variables, and hospital characteristics for use as predictors allowed for the building of supervised prediction models.The use of advanced ML techniques represents the robust use of data science to characterize complex clinical issues.The ability to predict expenditures at the patient level with good accuracy can allow for targeted care by anticipating the health care needs of patients.This will provide insights into designing effective and tailored interventions to meet the needs of high-cost patients and reduce costs.

Limitations of our study
Despite its strengths, we recognize that this work has several limitations.Missing data are a well-known limitation of utilizing EMR data for research, for which the HCUP-NIS is susceptible.Additionally, we chose to use only complete data without missing values for all predictor variables, thereby eliminating a substantial number of possible discharge events.Future work can involve employing data imputation methods rather than data exclusion.This could help to address the potential selection bias that can result from categorically excluding cases with missing data.
Additionally, the discharge data used may include discharge from readmissions of the same patient.The NIS data contain discharge-level records, which, per the HCUP-NIS documentation, means that "individual patients who are hospitalized multiple times in one year may be present in the NIS multiple times… this will be especially important to remember for certain conditions for which patients may be hospitalized multiple times in a single year" [29,40].As discussed, our target patients often experience numerous hospitalizations, and initial versus recurrent hospitalizations might differ in their character.As such, we considered limiting the analysis to initial discharge; however, "…there is no uniform patient identifier available that allows a patient-level analysis with the NIS." Therefore, for the purposes of this study, we included all the discharge data and performed the analysis at the discharge level.

Conclusion
We demonstrated the use of ML models to predict in-hospital charges for patients hospitalized for CHF exacerbation, COPD exacerbation and DKA.We found that length of stay, number of procedures during hospitalization, age and elective/nonelective admission were important predictors in these models for these diseases.This research can provide helpful information for medical management, which may decrease health insurance burdens in the future.

Fig. 4
Fig. 4 Comparison of Evaluation Metrics: Comparison of the RMSE and R-squared values and their confidence intervals for each final model each condition

Table 1
Patient sample characteristics

Table 3
Univariable results for continuous variables Fig. 3 Correlation plots with Pearson coefficients for variables each disease condition dataset.Only those variables with a Pearson coefficient > 0.2 are displayed

Table 4
Comparison of the evaluation metrics of the ML models